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(54) Title: INTERACTIVE VIDEO SYSTEM 

(57) Abstract 

An interactive television system including 
means for identifying video cameras, means for 
tracking a camera position and orientation and 
field of view size (position in x, y, z, zoom, 
tilt, roll, pan) and means for tracking static 
and moving objects in three-dimensional space 
presented in the captured scene. In addition 
the system comprises means to combine data 
related to the video, the camera identification, 
the camera position and orientation and field of 
view size, the objects tracking information and 
the video image. The system further comprises 
means to transmit the combine data to the 
viewer together with a synchronisation time 
code, means to point to objects appearing in the 
captured three dimensional scene, or means to 
navigate in a three-dimensional scene, means to 
identify marked objects or points and track them 
with time, means to present additional data or to 
manipulate the video image. 




192 



196 



r 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


DB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MR 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Tceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


Cdtc d*Tvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






CZ 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singap>orc 







wo 00/28731 PCT/GB99/03S1 4 

INTERACTIVE VIDEO SYSTEM 



The present invention relates to video and television systems, and 
more panicularly to such systems that enable the viewer to interact with a 
video image of a three dimensional scene in a way that pointing to an 
intrinsic point or object in the image will activate visual or/and acoustical 
effects to be presented to the viewer. 

The present invention describes a system that comprises means for 
identifying the video cameras, means for tracking the camera's position 
and orientation and field of view size (position in x,y,z, zoom, tilt, roll, 
pan) and means for tracking static and moving objects in three dimensional 
space presented in the captured scene. In addition the system comprises 
means to combine data related to the video, the camera's identification, 
the camera's position and orientation and field of view size, the objects 
tracking information and the video image. The system further comprises 
means to transmit the combined data to the viewer together with a 
synchronisaiion time code. In addition the system comprises means to 
receive the transmitted data and means to enable the viewer to interact 
with the received data, in particular means to point to objects appearing in 
the captured three dimensional scene, or means to navigate in a three 
dimensional scene. In addition the system comprises means to identify 
marked objects or points and track them with time using the received 
camera's position and orientation and field of view size. The system also 
comprises means to present additional data that was transmitted with the 
video image, or to manipulate the video image at the request of the 
viewer. In addition the system comprises means that enables the viewer to 
transmit data back the broadcaster. 

1 
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Tracking of camera position and orientation and field of view size 
(position in x,y,z, zoom, tilt, roll, pan) is a common technique used 
especially in virtual studios or electronic advertising in spon. Knowing the 
camera position and orientation and field of view size enables a studio 
controller to perform several actions automatically like: replacing chroma 
key billboard m a three dhnensional scene, tracking static objects and 
combining additional foreground objects, real or computer generated 
objects into a video scene. There are three main techniques to capture the 
camera position and orientation and field of view size. The first one is 
based on electro-mechanical or electro-optical sensors that are located on 
the camera and measure the rotation axes (tilt, roll and pan) and the status 
of the zoom and focus engine. The second one is based on image 
processing of the video sequence, and this can be done by pattern 
recognition of a visible pattern in the image or by calculating the relative 
correlation from frame to frame. The third technique is by tracking the 
motion of markers placed on the camera. By knowing the camera position 
and orientation and field of view size it is possible to find automatically 
the exact position of any object in the video image at any time using initial 
positioning data at a certain time, regardless of any change of the camera 
position and orientation and field of view size (position in x,y,z, zoom, 
pan, tilt, roll) during the video film sequence. 

Present known systems such as described in WO 95/10915 allow an 
object to be tracked by identification of the object at the transmitting end. 
These are used in television video replays to enable an object to be 
highlighted. Tracking movmg objects in a video sequence is mainly used 
in sport broadcast or in virtual studios where the position of the actors is 
important. There are mainly two techniques to track an object in a scene. 
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first by image processing and second by tracking sensors, markers, 
receivers or any marking tag both by optical and electronic means. 
Tracking the position of the object in the video image enables 
manipulation of the image with regard to the object position and display of 
5 features like object highlight, path marking, panoramic view of the object 
path and addition of text near the object. 

Traditional television or video broadcast involves one way direction 
of image information from video suppliers such as television stations, 
10 cable networks, Internet and local video suppliers to the watching 
audience. The watcher receives the video data via a television set or other 
type of visualisation media like computers in a passive way that does not 
allow the viewer to influence the image appearing in the visualisation 
media. 

Together with the conventional broadcast there are basic interactive 
systems that enable the watcher to receive text information as in teletext or 
pointing on specific areas in the two dimensional image as in webTV that 
activate basic processes like opening a web page or displaying text. At the 

20 present there is no system that can react to pointing at intrinsic objects or 
points captured with a non static video camera, or moving objects 
captured with a static camera. At the present state, there is no system that 
transmits to the watcher combined information on the camera's position 
and orientation and field of view size on one hand and positioning of 

25 objects and selected points in the scene together with the image data, a 
receivmg system located on the watcher side to receive the information 
and enabling the watcher to interact with objects and points in the 
transmitted video. 



3 
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The problem with such systems is that the viewer has no control 
over the highlighted or tracked object. It is an object of the present 
invention to provide an interactive video and television system which will 
enable individual viewers to operate the system in several ways: First to 
get more information related to stable and moving objects by pointing on 
them. The information can be textual or visual details relating to the 
object. Second, to select objects from the overall scene and activate a 
feedback action appearing on the screen or sending the identity of the 
selected object back to the broadcaster, for example a multiple choice 
answer in an education program, active voting or automatically purchasing 
of objects. Third to initiate actions by pointing to an object or a specific 
point or area in the image which will result in enhancement and 
modification of the image in a desired manner, or even presenting 
additional images together with the original image, for example 
highlighting objects, adding graphic overlays, opening new video windows 
and changing textures and colours. Each viewer may initiate different 
actions to suit a particular requirement and thus the system enables 
viewers to control data on the screen relating to their own selection of 
events. 

In a specific embodiment, the present invention relates to 
interactive abilities of a video watcher and more particularly the possibility 
of the video watcher to interact with the video image in a way that 
pointing out on a point or an object in the captured scene image will 
activate an action that will be heard or appear in the image. 

In a preferred embodiment the present invention provides a system 
that identifies the video cameras, tracks the camera's position and 
orientation and field of view size, tracks any static or moving objects 
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presented in the scene and also produce a synchronisation time code. 
Transmission of this information together or in parallel with the video 
signal enables identification of the co-ordinates in the image of objects or 
selected points appearing in the video image in real or virtual world. 

The present invention therefore provides an interactive video and 
television system including video camera means comprising one or more 
video cameras for video shooting/recording a scene at an original scene 
location, means to identify the video cameras, means for capturing the 
camera position and orientation and field of view size in each video field, 
means for identifying the location of one or more objects and generating 
data relating to the location of said one or more objects in the captured 
scene which are present in the scene for at least a period of time, means 
for recording or transmitting the video recorded images to enable a viewer 
to view the scene at a location remote from the original scene location, 
said means including means for recording or transmitting the data relating 
to the camera's position and orientation and field of view size and the 
identity and location of said one or more objects to enable said viewer at 
said remote location to interrogate said data, means to generate a time 
code to synchronise between the video image and the data related to this 
image. 

Preferably, the data relating to camera position and orientation and 
field of view size together or without the location of said one or more 
objects is transmitted together with the video signal and a synchronisation 
time code. In a specific embodiment the data may be transmitted in the 
vertical blanking interval or any other coded means in each video field. 
Alternatively the data relating to the location of said one or more objects 
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may be transmitted separately by external means separate from the video • 
signal via, for example, an additional transmission channel. 



In a preferred embodiment the data concerning the position of an 
5 object is transmitted only in a first video frame and the camera position 
and orientation and field of view size and a time code are preferably 
transmitted in every frame. The receiving apparatus records the initial 
position of the object in a first, specified frame, with the camera position 
and orientation and field of view size pertaining in that ft-ame. The 
10 receiving apparatus can thereafter determine the position of any object in 
any succeeding fi-ame by reference to the camera position and orientation 
and field of view size. 

In another embodiment the position of the objects is first calculated 
15 and transmitted in every ft-ame together with the image data, the camera 
position and orientation and field of view size and time code. 

The broadcast system may in a further embodiment transmit also 
"hidden" data which will not be seen automatically by the viewer. The 

20 viewer can, however, select such data by, for example, clicking onto an 
object on the screen. Because the processor in the receiving apparatus 
knows the position of each object on the TV screen, by virtue of the initial 
position information and subsequent camera position and orientation and 
field of view size information, it is possible to identify the marked object. 

25 The hidden data relating to that object can be recovered for display either, 
for example, as an overlay or on a split screen display, as an insert or on a 
separate monitor if provided. 

6 
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The system therefore enables objects in a 3D world to be 
mterrogated as if they were in a 2D world. 



The system also comprises receiving apparatus including a 
processor unit and a display unit that may be separate or may preferably 
be combined into a single unit. The processing unit preferably comprises 
graphical video and computational capabilities enabling overlay of 
computer generated graphics and computation of video data combined with 
computed information. 

The system therefore enables the viewer to generate data in relation 
to selected objects, either live or in a video replay situation, because the 
data relating to location of objects on a video sequence is available to the 
viewer at a remote location. 

Preferably the receiving apparatus can transmit data relating to an 
object displayed on said TV to indicate a preference to a broadcaster or 
intermediate party. 

Embodiments of the present invention will now be described, by 
way of example, with reference to the accompanying drawings in which :- 

Figure 1 shows a video transmitting equipment in accordance with 
the present invention; 

Figure 2 shows a video receiving equipment in accordance with the 
present invention for co-operation with the transmission apparatus of 
Figure 1. 

Figure 3 shows an alternative embodiment of the present invention 
illustrating a transmission system: and 
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Figure 4 shows a receiving system suitable for the transmission 
system of Figure 3. 



The present invention will now be described with reference to the 
drawings. The advantages and inventive features of the present invention 
will become apparent by a comparison with known television systems in 
which camera tracking is used. 

The present invention proposes a new configuration for transmitting 
data for all types of video broadcast and display. The new configuration is 
characterised by composing ordinary video data, information on the 
camera position and orientation and field of view size and object 
positioning and any information relating to objects in the video image or to 
the video image in general. For each image in the video sequence the 
following information is attached: camera identification number, camera 
position and orientation and field of view size for the specific frame, 
positioning or displacement of selected moving or static objects in the 
captured scene, for static objects or points the initial position can be sent 
only once, for the following frames the relevant position of the objects in 
the image can be calculated usmg the camera position and orientation and 
field of view size (position in x,y,z, pan, tilt, roll and zoom) attached to 
frame. The additional information regarding the camera and the selected 
objects can be on one hand integrated with the image by encoding the 
information into the image, or encoding the information into the vertical 
blanking interval (VBI) or other means to transmit the information 
together with the image data. On the other hand it can be sent by external 
means separate from the video signal via additional transmission channel, 
such as telephone wires, Internet, special television channels, cable 
channels or any other transporting means. The cameras and the objects 

8 
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information should be synchronised witii the video sequence in such a way 
that will enable the receiver side to use the transmitted information for the 
correct video image. 

5 The video data together with information on the cameras 

identification, cameras position and orientation and field of view size, the 
objects position or displacement, synchronisation time code and any 
information relating to objects in the video image or to the video image in 
general are receiving by a processing unit that may be separate or 
10 combined with the displaying unit. The processing unit is also comprised 
of a graphical, video and computational abilities that enable manipulation 
of the received video data and / or to overlay additional image or graphics 
onto the video data. 

15 Preferred embodiments of the present invention will now bei 

described: 

Figures 1 shows a transmission system comprising a first video 
camera 10 which is positioned to video a scene 20 comprising a football 
20 pitch with players 22,24 goal post 26 and penalty area 28. Other features 
on the pitch may be used but particular reference will be made to these 
features by way of example. 

The video camera 10 is preferable equipped with camera position 
25 and orientation and field of view size (position in x,y,z, pan, tilt, roll and 
zoom) sensor means 12 which senses the required camera position and 
orientation and field of view size as described hereinbefore. 

9 
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The camera is mounted on a tripod 14 to enable the camera to pan ' 
and tilt in addition to its normal zooming function. 



The video output of camera 10 is fed via cable 16 to a combiner 
circuit 18 to which is also connected the data output from sensor means 12 
via cable 19. 

The output of combiner circuit 18 is used as a transmitted video 
output 184. Alternatively, the camera data from sensor means 12 may be 
connected to a buffer 190 which may also act as a coder and transmitter to 
transmit the camera data separately via output 192 or to forward the data 
to a further combmer circuit 196 via line 194 where the data may be 
combined as described hereafter. 

The outputs 184 (video) and 192 (camera position and orientation 
and field of view size) may be transmitted to a viewer to provide the 
viewer with information allowing the viewer to manipulate the received 
video signal. The camera 10 will transmit the view of the pitch 20 which 
includes several feature points such as the goal 26 and penalty box 28. 
Other feamres such as the comers, halfway line etc may be used. By 
analysis of the video recorded scene and with knowledge of the camera 
position and orientation and field of view size, the video image may be 
manipulated by the viewer remotely, as explained with reference to the 
receiver circuitry of Figure 2. 

The camera data may be combined with the video data, as in output 
184, or may be transmitted separately as in output 192. 



10 
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If transmitted separately then it will be necessary to ensure 
synchronism between the video and camera data at the receiver in order to 
ensure that the camera data is correctly synchronised for each video 
frame. The viewer can then, for example, by identifying an object in one 
5 frame enable the system to automatically track the object through a 
succession of frames. 

If it is required to identify one or more of the objects such as 
players 22,24 this may be accomplished using, for example, microwave 
10 sensing equipment 200,202, with each player being equipped with an 
appropriate transducer. By triangulation in unit 204, the identity and 
position of each player can be determined. 

This information can be combined in a combiner circuit 196 with 
15 the camera data to provide information on the location and identity of each 
player. This combined information can be transmitted separately, as 
indicated by arrow 198, or could be combined with the video output 184 
in combiner 18 as indicated by dotted connection 199. 

20 With reference now to Figure 2, the receiving apparatus comprises 

buffer/decoder means 300 for receipt of combined video and camera data 
signals. 

The signals are decoded into data signals on line 302 and video on 
25 line 309. 

The video signals are used to display the video scene on TV 
monitor 306. 

11 
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The camera data signals are fed to a processor 308. A control and - 
feedback line 310 connects the TV monitor 306 to processor 308. 



The processor 308 stores the camera control data relating to die 
5 video frame being displayed and the processor 308 and TV monitor 306 
are synchronised via line 310. A keyboard or other input device 312 may 
be provided to input video information into processor 308. 

Processor 308 can input video information for display on the TV 
10 monitor in accordance with a selectable programmed menu which is 
selectable by the viewer. 

The viewer can select an object, for example, player 22 or 
billboard 29 either by use of a touch screen on the TV monitor or possibly 
15 by use of a laser pointer 314 or positionable x-y cursor or may identify the 
objects by voice commands or any other pointing or identification method. 

The viewer can select from the programmable menu to modify the 
object or to display data relating to the selected object or to enable further 
20 data or visual effects to be added to an object. 

An object can be tracked through a succession of video fields using 
the camera position and orientation and field of view size merely by 
identifying the objects in a first video field. 

25 

If object identification information is also transmitted then it is 
possible or the viewer to preselect when an object appears on a video 
image and for that object to be highlighted or data added etc 
automatically. 

12 
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Thus, the transmission to the viewer of camera data synciironised to 
a video image enables a plurality of viewers at a plurality of remote 
locations to modify the video image. 

5 

In a modified version in which the camera data is transmitted, 
separate to the video data, this data is received in a separate 
buffer/decoder 320, the output of which is connected to a control input of 
processor 308 to enable processor 308 to modify the video image. 

10 

In a further modified version the camera data may be supplied via 
an Internet connection, in which case buffer/decoder 320 may also 
transmit data received from processor 308 as indicated by outgoing arrows 
321,322. The system can then operate as a true interactive system with 
15 information being transmitted from the viewer to the broadcaster or to an 
intermediate party. 

The transmission system via, for example, the Internet may also 
operate when the camera data is transmitted together with the video data 
20 into decoder 300. 

Another preferred embodiments of the present invention will now 
be described: 

25 Figure 3 shows a transmission system comprising a video camera 

31 which is positioned to video a scene 41 comprising a chess board 42 
with playmg pieces 43, 44, 45 and an opposing player 46. 



13 
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The video camera 31 is preferably equipped with camera position • 
and orientation and field of view size (position in x,y,z, pan, tilt, roll and 
zoom) sensor means 32 which senses the required camera camera position 
and orientation and field of view size as described herein before. 

5 

The system comprise an object positioning aperture 51, that detects 
selected objects position in space. The positioning aperture detects the 
position of the chess playing pieces and the opposing player. 

10 The system comprises a data base 61 containing general 

information, for example historical information regarding the opposing 
player. The system also comprises a computerised device 62, that gives 
for example a computerised solution for the next chess move. 

15 The video ou^ut 31, camera camera position and orientation and 

field of view size data 32, objects positioning data 51, relevant 
information data 61 and relevant computerised information 62 are 
combined together via a combiner circuit 71. 

20 The output of combiner circuit 71 is used as a transmitted video 

output 72. Alternatively, the camera camera position and orientation and 
field of view size parameter data 32, objects positioning data 51, relevant 
information data 61 and relevant computerised information 62 may 
combined by alternative combiner circuit and transmitted separately via 

25 output 74, with the video signal 31 transmitted via video output 75. 

With reference now to Figure 4, the receiving apparatus comprises 
buffer/decoder means 400 for receipt of video, camera position and 
orientation and field of view size data, object positioning data, relevant 

14 



wo 00/28731 PCT/GB99/03514 

information data and relevant computerised information both in the 
composed format transmitted via output channel 62 or in the separate 
channels format transmitted via output channels 63,64. 

The signals are decoded into data signals on line 401 and video on 
line 402. 



The video signals are used to display the video scene on TV 
monitor 403. 

The camera position and orientation and field of view size data, 
object positioning data, relevant information data and relevant 
computerised information are fed to a processor 404. A control and 
feedback line 405 connects the TV monitor 403 to processor 404. 

The processor 404 stores the camera position and orientation and 
field of view size data, object positioning data, relevant information data 
and relevant computerised information relating to the video frame being 
displayed and the processor 404 and TV monitor 403 are synchronised via 
line 405. A keyboard or other mput device 407 may be provided to input 
video information into processor 404. 

Processor 404 can input video information for display on the TV 
monitor in accordance with a selectable programmed menu which is 
selectable by the viewer. 

The viewer can move a playing piece on the chess board, for 
example, playing piece 43 either by use of a touch screen on the TV 
monitor or possibly by use of a laser pointer 410 or positionable x-y 

15 
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cursor or may identify the objects by voice commands or any other - 
pomting or identification method. 



The viewer actions obtained from processor 404 are fed to a viewer 
5 data transmitting apparatus 420, that transmits the relevant information to 
the broadcaster for feedback. 

In a preferred embodiment, the interactive system operates as 
follows. 

10 

A viewer may be asked to mark an object m the video image, for 
example a television show where the viewer should select the right answer 
from a multiple choice answers, when pressing on the chosen answer the 
colour of the answer may change to green in case of a correct answer and 
15 to red in case of a wrong answer. Another example can be a mathematical 
lesson for children where the viewer should select a pile with a correct 
number of balls. 

In a preferred embodiment, the interactive system operates as 
20 follows. 

A viewer may press on an object in the video image. The image 
may be of a three dimensional scene captured in real world or in a virtual 
set. The pressing on the object may be used for activating additional 
25 information. For examples, pressing on an historical monument can 
present a textural information on the monument, or in another case, 
pressing on an advertising billboard presented for example in sport fields 
may give additional information or activate an animation clip inside the 
video image. In another example selecting a specific actor can give us 

16 
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information about his clothing manufacturer, prices, list of shops where it 
is possible to find these clothes or even a direct link to the manufacturers 
web site. 

In a preferred embodiment, the interactive system operates as 
follows. 

A viewer may be asked for information, for example in relation to 
a player on a field or possibly a series of questions on the TV screen. By 
clicking on a player or answer the viewer can indicate to the broadcaster a 
selected item. By sending the x-y marked position and the video field 
ntunber via additional communication channel, the apparatus at the 
broadcaster or at an intermediate station will know which item is being 
selected because the camera parameter from that video frame are known. 
Thus, for example, if several panel members are presenters for a program, 
or several artists, the viewer can select by clicking/pointing without 
having to telephone or otherwise communicate via a different medium. 

Thus, the interactivity will be immediate. Also, if a viewer clicks 
on a specific player/presenter/artist/answer, information relating to that 
can be presented immediately to the viewer. 
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CLAIMS 

1. An interactive video and television system including video camera 
means comprising one or more video cameras for video 
shooting/recording a scene at an original scene location, means to identify 
the video cameras, means for capturing the camera position and 
orientation and field of view size in each video field, means for identifying 
the location of one or more objects and generating data relating to the 
location of said one or more objects in the captured scene which are 
present m the scene for at least a period of time, means for recording or 
transmitting the video recorded images to enable a viewer to view the 
scene at a location remote from the original scene location, said means 
including means for recording or transmitting the data relating to the 
camera's position and orientation and field of view size and the identity 
and location of said one or more objects to enable said viewer at said 
remote location to interrogate said data, means to generate a time code to 
synchronise between the video image and the data related to this image, 

2. An interactive video and television system as claimed in claim 1 in 
which the system comprises means to enable the viewer to interact with 
the received data, in particular means to point to objects or identify objects 
appearing in a three dimensional scene, or means to navigate in a three 
dimensional scene. 

3. An interactive video and television system as claimed in claim 1 or 
claim 2 further comprising means to identify marked objects or points and 
track them with time using the received camera position and orientation 
and field of view size. 
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4. An interactive video and television system as claimed in any one of 
claims 1 to 3 in which the system comprises means to present additional 
data relating to objects in the video image or to the video image in general 
or any other information that was transmitted with the video image, or 

5 manipulate the video image at the request of the viewer. 

5. An interactive video and television system as claimed in any one of 
claims 1 to 4 in which the means for identifying the location of one or 
more objects includes means for generating data relating to the position of 

10 an object in only a first video frame. 

6. An interactive video and television system as claimed in claim 5 in 
which the data relating to the location of said one or more objects is 
transmitted together with the video signal. 

15 

7. An interactive video and television system as claimed in claim 6 in 
which the data is transmitted by encoding the data information into the 
image, for example in the vertical blanking interval in each video field. 

20 8. An interactive video and television system as claimed in claim 6 in 
which the data relating to the location of said one or more objects may be 
transmitted separately by external means separate from the video signal 
via, for example, an additional transmission channel. 

25 9. An interactive video and television system as claimed in any one of 
claims 1 to 8 in which the system also comprises receiving apparatus 
including a processor unit and a display unit that may be separate or may 
preferably be combined into a single unit. 
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10. An interactive video and television system as claimed in claim 9 in 
which the processing unit preferably comprises graphics, video and 
computational capabilities enabling combination of computer generated 
graphics, animation, manipulated video image or any visualisation 
overlays. The processing unit also enables computation of video data 
combined with computed information. 

1 1 . An interactive video and television system as claimed in any one of 
claims 1 to 10 in which the system also comprises receiving apparatus 
including a pointing and identification unit comprising a touch screen on 
the TV monitor, a laser pointer, a positionable x-y cursor or a voice 
recognition system. 

12. An interactive video and television system as claimed in any one of 
claims 1 to 11 in which the receiving apparatus can transmit data relating 
to an object displayed on said TV to indicate a preference to a broadcaster 
or intermediate party. 

13. An interactive video and television system as claimed in any one of 
claims 1 to 12 in which hidden data relatmg to said one or more objects is 
transmitted, said hidden data being accessible to a viewer only on selection 
of the object by the reviewer. 
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